Search CORE

12 research outputs found

Robust query processing for linked data fragments

Author: Acosta Maribel
Heling Lars
Publication venue: IOS Press
Publication date: 14/07/2022
Field of study

Linked Data Fragments (LDFs) refer to interfaces that allow for publishing and querying Knowledge Graphs on the Web. These interfaces primarily differ in their expressivity and allow for exploring different trade-offs when balancing the workload between clients and servers in decentralized SPARQL query processing. To devise efficient query plans, clients typically rely on heuristics that leverage the metadata provided by the LDF interface, since obtaining fine-grained statistics from remote sources is a challenging task. However, these heuristics are prone to potential estimation errors based on the metadata which can lead to inefficient query executions with a high number of requests, large amounts of data transferred, and, consequently, excessive execution times. In this work, we investigate robust query processing techniques for Linked Data Fragment clients to address these challenges. We first focus on robust plan selection by proposing CROP, a query plan optimizer that explores the cost and robustness of alternative query plans. Then, we address robust query execution by proposing a new class of adaptive operators: Polymorphic Join Operators. These operators adapt their join strategy in response to possible cardinality estimation errors. The results of our first experimental study show that CROP outperforms state-of-the-art clients by exploring alternative plans based on their cost and robustness. In our second experimental study, we investigate how different planning approaches can benefit from polymorphic join operators and find that they enable more efficient query execution in the majority of cases

KITopen

Characteristic sets profile features: Estimation and application to SPARQL query planning

Author: Acosta Maribel
Heling Lars
Kirrane Sabrina
Kirrane Sabrina
Ngonga Ngomo Axel-Cyrille
Ngonga Ngomo Axel-Cyrille
Publication venue: IOS Press
Publication date: 13/07/2023
Field of study

RDF dataset profiling is the task of extracting a formal representation of a dataset’s features. Such features may cover various aspects of the RDF dataset ranging from information on licensing and provenance to statistical descriptors of the data distribution and its semantics. In this work, we focus on the characteristics sets profile features that capture both structural and semantic information of an RDF dataset, making them a valuable resource for different downstream applications. While previous research demonstrated the benefits of characteristic sets in centralized and federated query processing, access to these fine-grained statistics is taken for granted. However, especially in federated query processing, computing this profile feature is challenging as it can be difficult and/or costly to access and process the entire data from all federation members. We address this shortcoming by introducing the concept of a profile feature estimation and propose a sampling-based approach to generate estimations for the characteristic sets profile feature. In addition, we showcase the applicability of these feature estimations in federated querying by proposing a query planning approach that is specifically designed to leverage these feature estimations. In our first experimental study, we intrinsically evaluate our approach on the representativeness of the feature estimation. The results show that even small samples of just 0.5% of the original graph’s entities allow for estimating both structural and statistical properties of the characteristic sets profile features. Our second experimental study extrinsically evaluates the estimations by investigating their applicability in our query planner using the well-known FedBench benchmark. The results of the experiments show that the estimated profile features allow for obtaining efficient query plans

KITopen

An Infrastructure for Spatial Linking of Survey Data

Author: Acosta Maribel
Bensmann Felix
Goebel Jan
Heling Lars
Jünger Stefan
Meinel Gotthard
Mucha Loren
Sikder Sujit
Sure-Vetter York
Zapilko Benjamin
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 01/01/2020
Field of study

Research on environmental justice comprises health and well-being aspects, as well as topics related to general social participation. In this research field, among others, there is a need for an integrated use of social science survey data and spatial science data, e.g. for combining demographic information from survey data with data on pollution from spatial data. However, for researchers it is challenging to link both data sources, because (1) the interdisciplinary nature of both data sources is different, (2) both underlie different legal restrictions, in particular regarding data privacy, and (3) methodological challenges arise regarding the use of geo-information systems (GIS) for the processing and analysis of spatial data. In this article, we present an infrastructure of distributed web services which supports researchers in the process of spatial linking. The infrastructure addresses the challenges researchers have to face during that process. We present an example case study on the investigation of environmental inequalities with regards to income and land use hazards in Germany by using georeferenced survey data of the GESIS Panel and the German Socio-economic Panel (SOEP), and by using spatial data from the Monitor of Settlement and Open Space Development (IOER Monitor). The results show that increasing income of survey respondents is associated with less exposure to land-use-related environmental hazards in Germany

SSOAR - Social Science Open Access Repository

SMART-KG: Hybrid Shipping for SPARQL Querying on the Web

Author: Acosta Maribel
Aluç Güneş
Aranda Carlos Buil
Bonatti Piero Andrea
Buil-Aranda Carlos
Erling Orri
Hartig Olaf
Hasnain Ali
Heling Lars
Hernández-Illera A.
Martínez-Prieto M.A.
Meimaris M.
Polleres Axel
Saleem Muhammad
Publication venue: ACM Digital Library
Publication date: 01/01/2020
Field of study

While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. Alternative solutions such as Triple Pattern Fragments (TPF) attempt to tackle the problem of availability by pushing query processing workload to the client side, but suffer from unnecessary transfer of irrelevant data on complex queries with large intermediate results. In this paper we present smart-KG, a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining TPF with shipping compressed KG partitions. Our evaluations show that smart-KG outperforms state-of-the-art client-side solutions and increases server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs

Crossref

KITopen

ISWC TPF Profiler Study Data

Author: Lars Heling (4751859)
Publication venue
Publication date
Field of study

Result of the Studies conducted using the TPF Profiler to assess the performance of TPF servers

FigShare

Robust query processing for Linked Data Fragments

Author: Acosta Maribel (Dr. rer. nat.)
Heling Lars (Dr.-Ing.)
Publication venue
Publication date: 09/10/2021
Field of study

Linked Data Fragments (LDFs) refer to interfaces that allow for publishing and querying Knowledge Graphs on the Web. These interfaces primarily differ in their expressivity and allow for exploring different trade-offs when balancing the workload between clients and servers in decentralized SPARQL query processing. To devise efficient query plans, clients typically rely on heuristics that leverage the metadata provided by the LDF interface, since obtaining fine-grained statistics from remote sources is a challenging task. However, these heuristics are prone to potential estimation errors based on the metadata which can lead to inefficient query executions with a high number of requests, large amounts of data transferred, and, consequently, excessive execution times. In this work, we investigate robust query processing techniques for Linked Data Fragment clients to address these challenges. We first focus on robust plan selection by proposing CROP, a query plan optimizer that explores the cost and robustness of alternative query plans. Then, we address robust query execution by proposing a new class of adaptive operators: Polymorphic Join Operators. These operators adapt their join strategy in response to possible cardinality estimation errors. The results of our first experimental study show that CROP outperforms state-of-the-art clients by exploring alternative plans based on their cost and robustness.In our second experimental study, we investigate to what extent different planning approaches can benefit from polymorphic join operators and find that they enable more efficient query execution in the majority of cases

Dokumentenrepositorium der RUB / RUB-Repository

Modelling of radiocesium in lakes:The VAMP model

Author: Bergström Ulla
Brittain John
Heling Rudie
Håkanson Lars
Monte Luigi
Suolanen Vesa
Publication venue: 'Elsevier BV'
Publication date: 01/01/1996
Field of study

VTT Research System

Modelling of normalized consequences of <sup>137</sup>Cs deposition in various aquatic environments

Author: Bergström U.
Brittain John
Heling Rudie
Håkanson Lars
Monte Luigi
Suolanen Vesa
Publication venue: International Atomic Energy Agency IAEA
Publication date
Field of study

VTT Research System

Modelling of normalized consequences of <sup>137</sup>Cs deposition in various aquatic environments

Author: Bergström U.
Brittain John
Heling Rudie
Håkanson Lars
Monte Luigi
Suolanen Vesa
Publication venue: International Atomic Energy Agency IAEA
Publication date
Field of study

VTT Research System